-
Notifications
You must be signed in to change notification settings - Fork 78
[ceph_migrate] trigger mgr failover when cluster health is degraded #1127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
/lgtm |
86d4a6d to
fc7df71
Compare
|
/retest-required |
fc7df71 to
6d41a6a
Compare
|
New changes are detected. LGTM label has been removed. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
I marked the PR as draft because there are points that i need to discuss with @fmount first to ensure robustness of the procedure. |
b8e070b to
c44b232
Compare
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/a6bc2713a27e4392ad35a38b052d75ff ✔️ noop SUCCESS in 0s |
c44b232 to
46909aa
Compare
ceph_migrate refactor restart mgr handler as task to fix role usagef7c69e3 to
3321964
Compare
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/5e74b248cbf849bcb51544e4395ba3fd ✔️ noop SUCCESS in 0s |
|
recheck |
Add a task in post.yaml to trigger Ceph manager failover when the cluster health status is not HEALTH_OK. This helps recover from degraded cluster states after migration operations. The task installs the cephadm package on all ComputeHCI nodes and then executes 'cephadm shell -- ceph mgr fail' on the first compute node. This approach avoids container-based CLI complexity and uses the native cephadm tool available on compute nodes where Ceph daemons are running. The task only runs when ComputeHCI nodes are available and the cluster health is degraded (HEALTH_WARN or HEALTH_ERR). Signed-off-by: Roberto Alfieri <[email protected]>
3321964 to
cca2563
Compare
Add a task in
post.yamlto trigger Ceph manager failover when the cluster health status is notHEALTH_OK. This helps recover from degraded cluster states after migration operations.The task installs the
cephadmpackage on allComputeHCInodes and then executescephadm shell -- ceph mgr failon the first compute node. This approach avoids container-based CLI complexity and uses the native cephadm tool available on compute nodes where Ceph daemons are running.The task only runs when ComputeHCI nodes are available and the cluster health is degraded (HEALTH_WARN or HEALTH_ERR).